Goto

Collaborating Authors

 aggregation operator


Demystifying Oversmoothing in Attention-Based Graph Neural Networks

Neural Information Processing Systems

Oversmoothing in Graph Neural Networks (GNNs) refers to the phenomenon where increasing network depth leads to homogeneous node representations. While previous work has established that Graph Convolutional Networks (GCNs) exponentially lose expressive power, it remains controversial whether the graph attention mechanism can mitigate oversmoothing. In this work, we provide a definitive answer to this question through a rigorous mathematical analysis, by viewing attention-based GNNs as nonlinear time-varying dynamical systems and incorporating tools and techniques from the theory of products of inhomogeneous matrices and the joint spectral radius. We establish that, contrary to popular belief, the graph attention mechanism cannot prevent oversmoothing and loses expressive power exponentially. The proposed framework extends the existing results on oversmoothing for symmetric GCNs to a significantly broader class of GNN models, including random walk GCNs, Graph Attention Networks (GATs) and (graph) transformers.



Sequential-Parallel Duality in Prefix Scannable Models

arXiv.org Artificial Intelligence

Modern neural sequence models are designed to meet the dual mandate of parallelizable training and fast sequential inference. Recent developments have given rise to various models, such as Gated Linear Attention (GLA) and Mamba, that achieve such ``sequential-parallel duality.'' This raises a natural question: can we characterize the full class of neural sequence models that support near-constant-time parallel evaluation and linear-time, constant-space sequential inference? We begin by describing a broad class of such models -- state space models -- as those whose state updates can be computed using the classic parallel prefix scan algorithm with a custom associative aggregation operator. We then define a more general class, Prefix-Scannable Models (PSMs), by relaxing the state aggregation operator to allow arbitrary (potentially non-associative) functions such as softmax attention. This generalization unifies many existing architectures, including element-wise RNNs (e.g., Mamba) and linear transformers (e.g., GLA, Mamba2, mLSTM), while also introducing new models with softmax-like operators that achieve O(1) amortized compute per token and log(N) memory for sequence length N. We empirically evaluate such models on illustrative small-scale language modeling and canonical synthetic tasks, including state tracking and associative recall. Empirically, we find that PSMs retain the expressivity of transformer-based architectures while matching the inference efficiency of state space models -- in some cases exhibiting better length generalization than either.


packetLSTM: Dynamic LSTM Framework for Streaming Data with Varying Feature Space

arXiv.org Artificial Intelligence

We study the online learning problem characterized by the varying input feature space of streaming data. Although LSTMs have been employed to effectively capture the temporal nature of streaming data, they cannot handle the dimension-varying streams in an online learning setting. Therefore, we propose a dynamic LSTM-based novel method, called packetLSTM, to model the dimension-varying streams. The packetLSTM's dynamic framework consists of an evolving packet of LSTMs, each dedicated to processing one input feature. Each LSTM retains the local information of its corresponding feature, while a shared common memory consolidates global information. This configuration facilitates continuous learning and mitigates the issue of forgetting, even when certain features are absent for extended time periods. The idea of utilizing one LSTM per feature coupled with a dimension-invariant operator for information aggregation enhances the dynamic nature of packetLSTM. This dynamic nature is evidenced by the model's ability to activate, deactivate, and add new LSTMs as required, thus seamlessly accommodating varying input dimensions. The packetLSTM achieves state-of-the-art results on five datasets, and its underlying principle is extended to other RNN types, like GRU and vanilla RNN.


On Globular T-Spherical Fuzzy (G-TSF) Sets with Application to G-TSF Multi-Criteria Group Decision-Making

arXiv.org Artificial Intelligence

In this paper, we give the concept of Globular T-Spherical Fuzzy (G-TSF) Sets (G-TSFSs) as an innovative extension of T-Spherical Fuzzy Sets (TSFSs) and Circular Spherical Fuzzy Sets (C-SFSs). G-TSFSs represent membership, indeterminacy, and non-membership degrees using a globular/sphere bound that can offer a more accurate portrayal of vague, ambiguous, and imprecise information. By employing a structured representation of data points on a sphere with a specific center and radius, this model enhances decision-making processes by enabling a more comprehensive evaluation of objects within a flexible region. Following the newly defined G-TSFSs, we establish some basic set operations and introduce fundamental algebraic operations for G-TSF Values (G-TSFVs). These operations expand the evaluative capabilities of decision-makers, facilitating more sensitive decision-making processes in a broader region. To quantify a similarity measure (SM) between GTSFVs, the SM is defined based on the radius of G-TSFSs. Additionally, Hamming distance and Euclidean distance are introduced for G-TSFSs. We also present theorems and examples to elucidate computational mechanisms. Furthermore, we give the G-TSF Weighted Average (G-TSFWA) and G-TSF Weighted Geometric (G-TSFWG) operators. Leveraging our proposed SM, a Multi-Criteria Group Decision-Making (MCGDM) scheme for G-TSFSs, named G-TSF MCGDM (G-TSFMCGDM), is developed to address group decision-making problems. The applicability and effectiveness of the proposed G-TSFMCGDM method are demonstrated by applying it to solve the selection problem of the best venue for professional development training sessions in a firm. The analysis results affirm the suitability and utility of the proposed method for resolving MCGDM problems, establishing its effectiveness in practical decision-making scenarios.


Demystifying Oversmoothing in Attention-Based Graph Neural Networks

arXiv.org Machine Learning

Oversmoothing in Graph Neural Networks (GNNs) refers to the phenomenon where increasing network depth leads to homogeneous node representations. While previous work has established that Graph Convolutional Networks (GCNs) exponentially lose expressive power, it remains controversial whether the graph attention mechanism can mitigate oversmoothing. In this work, we provide a definitive answer to this question through a rigorous mathematical analysis, by viewing attention-based GNNs as nonlinear time-varying dynamical systems and incorporating tools and techniques from the theory of products of inhomogeneous matrices and the joint spectral radius. We establish that, contrary to popular belief, the graph attention mechanism cannot prevent oversmoothing and loses expressive power exponentially. The proposed framework extends the existing results on oversmoothing for symmetric GCNs to a significantly broader class of GNN models, including random walk GCNs, Graph Attention Networks (GATs) and (graph) transformers.


Principles for Initialization and Architecture Selection in Graph Neural Networks with ReLU Activations

arXiv.org Artificial Intelligence

This article derives and validates three principles for initialization and architecture selection in finite width graph neural networks (GNNs) with ReLU activations. First, we theoretically derive what is essentially the unique generalization to ReLU GNNs of the well-known He-initialization. Our initialization scheme guarantees that the average scale of network outputs and gradients remains order one at initialization. Second, we prove in finite width vanilla ReLU GNNs that oversmoothing is unavoidable at large depth when using fixed aggregation operator, regardless of initialization. We then prove that using residual aggregation operators, obtained by interpolating a fixed aggregation operator with the identity, provably alleviates oversmoothing at initialization. Finally, we show that the common practice of using residual connections with a fixup-type initialization provably avoids correlation collapse in final layer features at initialization. Through ablation studies we find that using the correct initialization, residual aggregation operators, and residual connections in the forward pass significantly and reliably speeds up early training dynamics in deep ReLU GNNs on a variety of tasks.


The Joint Weighted Average (JWA) Operator

arXiv.org Artificial Intelligence

Information aggregation is a vital tool for human and machine decision making, especially in the presence of noise and uncertainty. Traditionally, approaches to aggregation broadly diverge into two categories, those which attribute a worth or weight to information sources and those which attribute said worth to the evidence arising from said sources. The latter is pervasive in particular in the physical sciences, underpinning linear order statistics and enabling non-linear aggregation. The former is popular in the social sciences, providing interpretable insight on the sources. Thus far, limited work has sought to integrate both approaches, applying either approach to a different degree. In this paper, we put forward an approach which integrates--rather than partially applies--both approaches, resulting in a novel joint weighted averaging operator. We show how this operator provides a systematic approach to integrating a priori beliefs about the worth of both source and evidence by leveraging compositional geometry--producing results unachievable by traditional operators. We conclude and highlight the potential of the operator across disciplines, from machine learning to psychology.


Circular Pythagorean fuzzy sets and applications to multi-criteria decision making

arXiv.org Artificial Intelligence

In this paper, we introduce the concept of circular Pythagorean fuzzy set (value) (C-PFS(V)) as a new generalization of both circular intuitionistic fuzzy sets (C-IFSs) proposed by Atannassov and Pythagorean fuzzy sets (PFSs) proposed by Yager. A circular Pythagorean fuzzy set is represented by a circle that represents the membership degree and the non-membership degree and whose center consists of non-negative real numbers $\mu$ and $\nu$ with the condition $\mu^2+\nu^2\leq 1$. A C-PFS models the fuzziness of the uncertain information more properly thanks to its structure that allows modelling the information with points of a circle of a certain center and a radius. Therefore, a C-PFS lets decision makers to evaluate objects in a larger and more flexible region and thus more sensitive decisions can be made. After defining the concept of C-PFS we define some fundamental set operations between C-PFSs and propose some algebraic operations between C-PFVs via general $t$-norms and $t$-conorms. By utilizing these algebraic operations, we introduce some weighted aggregation operators to transform input values represented by C-PFVs to a single output value. Then to determine the degree of similarity between C-PFVs we define a cosine similarity measure based on radius. Furthermore, we develop a method to transform a collection of Pythagorean fuzzy values to a PFS. Finally, a method is given to solve multi-criteria decision making problems in circular Pythagorean fuzzy environment and the proposed method is practiced to a problem about selecting the best photovoltaic cell from the literature. We also study the comparison analysis and time complexity of the proposed method.


Admissibility in Strength-based Argumentation: Complexity and Algorithms (Extended Version with Proofs)

arXiv.org Artificial Intelligence

Recently, Strength-based Argumentation Frameworks (StrAFs) have been proposed to model situations where some quantitative strength is associated with arguments. In this setting, the notion of accrual corresponds to sets of arguments that collectively attack an argument. Some semantics have already been defined, which are sensitive to the existence of accruals that collectively defeat their target, while their individual elements cannot. However, until now, only the surface of this framework and semantics have been studied. Indeed, the existing literature focuses on the adaptation of the stable semantics to StrAFs. In this paper, we push forward the study and investigate the adaptation of admissibility-based semantics. Especially, we show that the strong admissibility defined in the literature does not satisfy a desirable property, namely Dung's fundamental lemma. We therefore propose an alternative definition that induces semantics that behave as expected. We then study computational issues for these new semantics, in particular we show that complexity of reasoning is similar to the complexity of the corresponding decision problems for standard argumentation frameworks in almost all cases. We then propose a translation in pseudo-Boolean constraints for computing (strong and weak) extensions. We conclude with an experimental evaluation of our approach which shows in particular that it scales up well for solving the problem of providing one extension as well as enumerating them all.